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VISION-AIDED INERTIAL NAVIGATION 


CROSS-REFERENCE TO RELATED 
APPLICATIONS 


This application claims priority under 35 U.S.C. §119(e) 
to U.S. Provisional Patent Application Nos. 61/040,473, 
filed Mar. 28, 2008, which is incorporated herein by refer- 
ence in its entirety. 


STATEMENT OF GOVERNMENT RIGHTS 


This invention was made with Government support under 
Grant Number MTP-1263201 awarded by the NASA Mars 
Technology Program, and support under Grant Number 
EIA-0324864, IIS-0643680 awarded by the National Sci- 
ence Foundation. The Government has certain rights in this 
invention. 


TECHNICAL FIELD 


This document pertains generally to navigation, and more 
particularly, but not by way of limitation, to vision-aided 
inertial navigation. 


BACKGROUND 


Existing technologies for navigation are not without 
shortcomings. For example, global positioning system 
(GPS) based navigation systems require good signal recep- 
tion from satellites in orbit. With a GPS-based system, 
navigation in urban areas and indoor navigation is some- 
times compromised because of poor signal reception. In 
addition, GPS-based systems are unable to provide close- 
quarter navigation for vehicular accident avoidance. Other 
navigation systems suffer from sensor errors (arising from 
slippage, for example) and an inability to detect humans or 
other obstacles. 


OVERVIEW 


This document discloses a system for processing visual 
information from a camera, for example, and inertial sensor 
data to provide an estimate of pose or other localization 
information. The camera provides data including images 
having a number of features visible in the environment and 
the inertial sensor provides data with respect to detected 
motion. A processor executes an algorithm to combine the 
camera data and the inertial sensor data to determine posi- 
tion, orientation, speed, acceleration or other higher order 
derivative information. 

A number of features within the environment are tracked. 
The tracked features correlate with relative movement as to 
the frame of reference with respect to the environment. As 
the number of features increases, the complexity of the data 
processing also rises. One example of the present subject 
matter uses an Extended Kalman filter (EKF) having a 
computational complexity that varies linearly with the num- 
ber of tracked features. 

In one example, a first sensor provides data for tracking 
the environmental features and a second sensor provides 
data corresponding to movement of the frame of reference 
with respect to the environment. A first sensor can include a 
camera and a second sensor can include an inertial sensor. 

Other types of sensors can also be used. Each sensor can 
be classified as a motion sensor or as a motion inferring 
sensor. One example of a sensor that directly detects motion 
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is a Doppler radar system and an example of a sensor that 
detects a parameter from which motion can be inferred is a 
camera. 

In one example, a single sensor provides data correspond- 
ing to the tracked features as well as data corresponding to 
relative movement of the frame of reference within the 
environment. Data from the single sensor can be multi- 
plexed in time, in space, or in another parameter. 

The sensors can be located on (or coupled to), either or 
both of the frame of reference and the environment. Relative 
movement as to the frame of reference and the environment 
can occur by virtue of movement of the frame of reference 
within a stationary environment or it can occur by virtue of 
a stationary frame of reference and a traveling environment. 

Data provided by the at least one sensor is processed using 
a processor that implements a filter algorithm. The filter 
algorithm, in one example, includes an EKF, however, other 
filters are also contemplated. 

In one example, a system includes a feature tracker, a 
motion sensor and a processor. The feature tracker is con- 
figured to provide feature data for a plurality of features 
relative to a frame of reference in an environment for a 
period of time. The motion sensor is configured to provide 
motion data for navigation of the frame of reference in the 
environment for the period of time. The processor is com- 
municatively coupled to the feature tracker and communi- 
catively coupled to the motion sensor. The processor is 
configured to generate at least one of navigation information 
for the frame of reference and the processor is configured to 
carry out the estimation at a computational complexity linear 
with the number of tracked features. Linear computational 
complexity is attained by simultaneously using each fea- 
ture’s measurements to impose constraints between the 
poses from which the feature was observed. This is imple- 
mented by manipulating the residual of the feature measure- 
ments to remove the effects of the feature estimate error 
(either exactly or to a good approximation). 

This overview is intended to provide an overview of 
subject matter of the present patent application. It is not 
intended to provide an exclusive or exhaustive explanation 
of the invention. The detailed description is included to 
provide further information about the present patent appli- 
cation. 


BRIEF DESCRIPTION OF THE DRAWINGS 


In the drawings, which are not necessarily drawn to scale, 
like numerals may describe similar components in different 
views. Like numerals having different letter suffixes may 
represent different instances of similar components. The 
drawings illustrate generally, by way of example, but not by 
way of limitation, various embodiments discussed in the 
present document. 

FIG. 1 illustrates a composite view of time sampled travel 
of a frame of reference relative to an environment. 

FIG. 2 illustrates a block diagram of a system. 

FIG. 3 illustrates selected images from a dataset. 

FIG. 4 illustrates an estimated trajectory overlaid on a 
map. 

FIG. 5 illustrates position, attitude and velocity for x-axis, 
y-axis, and z-axis. 


DETAILED DESCRIPTION 


FIG. 1 illustrates a composite view of five time-sampled 
positions during travel of frame of reference 15 within 
environment 20. Environment 20 includes two features 
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illustrated here as a tree and a U.S. postal mailbox, denoted 
as feature 5 and feature 10, respectively. At each of the time 
sampled points, each of the two features are observed as 
denoted in the figure. For example, at time t,, frame of 
reference 15 observes feature 5 and feature 10 along the line 
segments illustrated. At a later time t,, frame of reference 15 
again observes feature 5 and feature 10, but with a different 
perspective. In the figure, the frame of reference travels a 
curving path in view of the features. 

Navigation information as to the relative movement 
between the frame of reference 15 and the environment 20 
can be characterized using sensor data. 

Various types of sensors can be identified. A first type of 
sensor provides direct measurement of quantities related to 
motion. Data from a sensor of a second type can provide data 
by which motion can be inferred. Data can also be derived 
form a statistical model of motion. 

Asensor of the first type provides an output that is derived 
from direct measurement of the motion. For example, a 
wheel encoder provides data based on rotation of the wheel. 
Other examples include a speedometer, a Doppler radar, a 
gyroscope, an accelerometer, an airspeed sensor (such as 
pitot tube), and a global positioning system (GPS). 

A motion inferring sensor provide an output that, after 
processing, allows an inference of motion. For example, a 
sequence of camera images can be analyzed to infer relative 
motion. In addition to a camera (single or multiple), other 
examples of motion-inferring sensors include a laser scanner 
(either 2D or 3D), sonar, and radar. 

In addition to receiving data from a motion sensor and a 
motion-inferring sensor, data can also be derived from a 
statistical probabilistic model of motion. For example, a 
pattern of vehicle motion through a roadway intersection can 
provide data for the present subject matter. Additionally, 
various types of kinematic or dynamic models can be used 
to describe the motion. 

With reference again to FIG. 1, an estimate of the motion 
of frame of reference 15 at each of the time samples can be 
generated using an inertial measurement unit (IMU). 

In addition, feature 5 and feature 10 can be tracked using 
a video camera. 

FIG. 2 illustrates system 100 according to one example of 
the present subject matter. System 100 includes first sensor 
110 and second sensor 120. In one example, sensor 100 
includes a feature tracker and sensor 120 includes a motion 
sensor, a motion inferring sensor, or a motion tracking 
model. The figure illustrates two sensors, however, these can 
be combined and implemented as a single sensor, such as for 
example, a camera or a radar system, in which the function 
of sensing motion and tracking features are divided in time, 
space, or other parameter. In addition, more than two sensors 
can be provided. 

System 100 uses data derived from two sensing modali- 
ties which are represented in the figure as first sensor 110 
and second sensor 120. The first sensing modality entails 
detecting motion of the frame of reference with respect to 
the environment. The first sensing modality expresses a 
constraint as to consecutive poses and a motion measure- 
ment. This motion can be sensed using a direct motion 
sensor, a motion tracking sensor, a motion inferring sensor 
or based on feature observations. The second sensing modal- 
ity includes feature observations and is a function of a 
particular feature and a particular pose. 

System 100 includes processor 130 configured to receive 
data from the one or more sensors. Processor 130, in one 
example, includes instructions for implementing an algo- 
rithm to process the data and derive navigation information. 
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Processor 130, in one example, implements a filter algo- 
rithm, such as a Kalman filter or an extended Kalman filter 
(EKF). 

Data acquired using the feature tracking sensor and a 
motion sensor (or motion-inferring sensor or a motion 
tracking model) is processed by an algorithm. The algorithm 
has complexity linear with the number of tracked features. 
Linear complexity means that complexity of the calculation 
doubles with a doubling of the number of features that are 
tracked. In order to obtain linear complexity, the algorithm 
uses feature measurements for imposing constraints between 
the poses. This is achieved by projecting the residual equa- 
tion of the filter to remove dependency of the residual on the 
feature error or higher order terms. 

Processor 130 provides an output to output device 140. 
Output device 140, in various examples, includes a memory 
or other storage device, a visible display, a printer, an 
actuator (configured to manipulate a hardware device), and 
a controller (configured to control another system). 

A number of output results are contemplated. For 
example, the algorithm can be configured to determine a 
position of a particular feature. The feature is among those 
tracked by one of the sensors and its position is described as 
a point in three-dimensional space. The results can include 
navigation information for the frame of reference. For 
example, a position, attitude, orientation, velocity, accelera- 
tion or other higher order derivative with respect to time can 
be calculated. In one example, the results include the pose 
for the frame of reference. The pose includes a description 
of position and attitude. Orientation refers to a single degree 
of freedom and is commonly referred to as heading. Atti- 
tude, on the other hand, includes the three dimensions of 
roll, pitch and yaw. 

The output can include a position, an orientation, a 
velocity (linear or rotational), acceleration (linear or rota- 
tional) or a higher order derivative of position with respect 
to time. 

The output can be of any dimension, including 1-dimen- 
sional, 2-dimensional or 3-dimensional. 

The frame of reference can include, for example, an 
automobile, a vehicle, or a pedestrian. The sensors are in 
communication with a processor, as shown in FIG. 2, and 
provide data relative to the frame of reference. For example, 
a particular sensor can have multiple components with one 
portion affixed to the frame of reference and another portion 
affixed to the environment. 

In other examples, a portion of a sensor is decoupled from 
the frame of reference and provides data to a remote 
processor. 

In one example, a feature in space describes a particular 
point, and thus, its position within an environment can be 
identified using three degrees of freedom. In contrast to a 
feature, the frame of reference can be viewed as a rigid body 
having six degrees of freedom. In particular, the degrees of 
freedom for a frame of reference can be described as moving 
up and down, moving left and right, moving forward and 
backward, tilting up and down (pitch), turning left and right 
(yaw), and tilting side to side (roll). 

Consider next an example of an Extended Kalman Filter 
(EKF)-based algorithm for real-time vision-aided inertial 
navigation. This example includes derivation of a measure- 
ment model that is able to express the geometric constraints 
that arise when a static feature is observed from multiple 
camera poses. This measurement model does not require 
including the 3D feature position in the state vector of the 
EKF and is optimal, up to linearization errors. The vision- 
aided inertial navigation algorithm has computational com- 
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plexity that is linear in the number of features, and is capable 
of high-precision pose estimation in large-scale real-world 
environments. The performance of the algorithm can be 
demonstrated with experimental results involving a camera/ 
IMU system localizing within an urban area. 

Introduction 

Vision-aided inertial navigation has benefited from recent 
advances in the manufacturing of MEMS-based inertial 
sensors. Such sensors have enabled small, inexpensive, and 
very accurate Inertial Measurement Units (IMUs), suitable 
for pose estimation in small-scale systems such as mobile 
robots and unmanned aerial vehicles. These systems often 
operate in urban environments where GPS signals are unre- 
liable (the “urban canyon”), as well as indoors, in space, and 
in several other environments where global position mea- 
surements are unavailable. 

Visual sensing provides images with high-dimensional 
measurements, and having rich information content. A fea- 
ture extraction method can be used to detect and track 
hundreds of features in images. However, the high volume 
of data also poses a significant challenge for estimation 
algorithm design. When real-time localization performance 
is required, there is a fundamental trade-off between the 
computational complexity of an algorithm and the resulting 
estimation accuracy. 

The present algorithm can be configured to optimally 
utilize the localization information provided by multiple 
measurements of visual features. When a static feature is 
viewed from several camera poses, it is possible to define 
geometric constraints involving all these poses. This docu- 
ment describes a model for expressing these constraints 
without including the 3D feature position in the filter state 
vector, resulting in computational complexity only linear in 
the number of features. 

The Simultaneous Localization and Mapping (SLAM) 
paradigm refers to a family of algorithms for fusing inertial 
measurements with visual feature observations. In these 
methods, the current IMU pose, as well as the 3D positions 
of all visual landmarks are jointly estimated. These 
approaches share the same basic principles with SLAM- 
based methods for camera-only localization, with the dif- 
ference that IMU measurements, instead of a statistical 
motion model, are used for state propagation. SLAM-based 
algorithms account for the correlations that exist between 
the pose of the camera and the 3D positions of the observed 
features. SLAM-based algorithms, on the other hand, suffer 
high computational complexity; properly treating these cor- 
relations is computationally costly, and thus performing 
vision-based SLAM in environments with thousands of 
features remains a challenging problem. 

The present subject matter includes an algorithm that 
expresses constraints between multiple camera poses, and 
thus attains higher estimation accuracy, in cases where the 
same feature is visible in more than two images. 

The multi-state constraint filter of the present subject 
matter exploits the benefits of delayed linearization while 
having complexity only linear in the number of features. By 
directly expressing the geometric constraints between mul- 
tiple camera poses it avoids the computational burden and 
loss of information associated with pairwise displacement 
estimation. Moreover, in contrast to SLAM-type 
approaches, it does not require the inclusion of the 3D 
feature positions in the filter state vector, but still attains 
optimal pose estimation. 

Estimator Description 

A goal of the proposed EKF-based estimator is to track the 

3D pose of the IMU-aflixed frame {I} with respect to a 
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global frame of reference {G}. In order to simplify the 
treatment of the effects of the earth’s rotation on the IMU 
measurements (cf. Equations 7 and 8), the global frame is 
chosen as an Earth-Centered, Earth-Fixed (ECEF) frame. An 
overview of the algorithm is given in Table 1. 


TABLE 1 
Multi-State Constraint Filter 


Propagation for each IMU measurement received, propagate the filter 


state and covariance. 


Image Every time a new image is recorded: 

registration augment the state and covariance matrix with a copy of 
the current camera pose estimate; and 
image processing module begins operation. 

Update when the feature measurements of a given image become 
available, perform an EKF update. 


The IMU measurements are processed immediately as 
they become available, for propagating the EKF state and 
covariance. On the other hand, each time an image is 
recorded, the current camera pose estimate is appended to 
the state vector. State augmentation is used for processing 
the feature measurements, since during EKF updates the 
measurements of each tracked feature are employed for 
imposing constraints between all camera poses from which 
the feature was seen. Therefore, at any time instant the EKF 
state vector comprises (i) the evolving IMU state, X,,,,,, and 
(1) a history of up to N,,,,, past poses of the camera. The 
various components of the algorithm are described in detail 
below. 

A. Structure of the EKF State Vector 
The evolving IMU state is described by the vector: 


Xu = (eq? by vt bE pty (Bgaatont) 


where “,,q is the unit quaternion describing the rotation from 
frame {G} to frame {I}, %p, and @v, are the IMU position 
and velocity with respect to {G}, and b, and b, are 3x1 
vectors that describe the biases affecting the gyroscope and 
accelerometer measurements, respectively. The IMU biases 
are modeled as random walk processes, driven by the white 
Gaussian noise vectors n,,,, and n,,,,, respectively. Following 
Eq. (1), the IMU error-state is defined as: 


Pa xT cot 7? Gxty Equation 2 
X mu = [607 b, Sop b, Spr] (Equation 2) 


For the position, velocity, and biases, the standard addi- 
tive error definition is used (i.e., the error in the estimate X 
of a quantity x is defined as X=x-x). However, for the 
quaternion a different error definition is employed. In par- 
ticular, if q is the estimated value of the quaternion q, then 
the orientation error is described by the error quaternion 6q, 
which is defined by the relation q=5q © q. In this expression, 
the symbol ® denotes quaternion multiplication. The error 
quaternion is 


1 T Equation 3 
og = [500 i eq ‘ 


Intuitively, the quaternion 8q describes the (small) rota- 
tion that causes the true and estimated attitude to coincide. 
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Since attitude corresponds to 3 degrees of freedom, using 50 
to describe the attitude errors is a minimal representation. 

Assuming that N camera poses are included in the EKF 
state vector at time-step k, this vector has the following 
form: 


Ree an cat Sp var Ger i (Equation 4) 
where 

aq 
are “pg, i=l... N the estimates of the camera attitude and 


position, respectively. The EKF errorstate vector is defined 
accordingly: 


TGx mT 
] 


R= Ray 806 Be" - . - ey Boy (Equation 5) 


B. Propagation 

The filter propagation equations are derived by discreti- 
zation of the continuous-time IMU system model, as 
described in the following: 

1) Continuous-Time System Modeling: 

The time evolution of the IMU state is described by: 


; 1 ; tion 6 
GAO) = 5 MOONGTD, be(O) = Mg Se 


Fi (0) = Sala), ba) = Mya), ©) = Fv) 


In these expressions, a is the body acceleration in the 
global frame, m=[, ©, w,|” is the rotational velocity 
expressed in the IMU frame, and 


The gyroscope and accelerometer measurements, w,,, and 
a,, respectively, are given by: 


Wm = + C(L@we + bg tng (Equation 7) 


am = CG Ca — Sg + 2Lwg x|9v; + log x?" p)) + Ba +g (Equation 8) 


where C(*) denotes a rotational matrix, and n, and n, are 
zero-mean, white Gaussian noise processes modeling the 
measurement noise. Note that the IMU measurements incor- 
porate the effects of the planet’s rotation, w,. Moreover, the 
accelerometer measurements include the gravitational accel- 
eration, “g, expressed in the local frame. 

Applying the expectation operator in the state propagation 
equations (Equation 6) yields the equations for propagating 
the estimates of the evolving IMU state: 


ee See (Equation 9) 
64 = 5 MONG, by = Osx, 
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-continued 


$9, = Cha 2Lwg xI%1 — Loe XP OB, +8 


3 rer Ga 
bg = 3x15 7° By = "Fy 


where, for brevity, denote 


and o=0,,-b.-C gc- The linearized continuous-time model 
for the IMU error-state is: 


XorF: Xet Gree (Equation 10) 


where Dal,” ei n,” 0,4 |" is the system noise. The 
covariance matrix of Nye, Qryz» depends on the IMU noise 
characteristics and is computed off-line during sensor cali- 
bration. The matrices F and G that appear in Equation 10 are 
given by: 


-l®x] -b O3x3 033 O33 
03x3 3x3 03.3 03x3 3x3 
F= -ChLa x] 03.3 —2Lwe x] -G -Lwe x]? 
03x3 3x3 03.3 03x3 O3x3 
03x3 033 Is 03x3 03x3 


where I, is the 3x3 identity matrix, and 


—1s 03x3 03x3 033 


03x3 03.3 


Tr 
-C, 


03.3 «6 


033 O3x3 033 


03x3 03x3 03.3 


03x3 03x3 03x3 033 


2) Discrete-Time Implementation 

The IMU samples the signals w,, and a,, with a period T, 
and these measurements are used for state propagation in the 
EKF. Every time a new IMU measurement is received, the 
IMU state estimate is propagated using 5th order Runge- 
Kutta numerical integration of Equation 9. The EKF cova- 
riance matrix is also propagated. For this purpose, consider 
the following partitioning for the covariance: 


Pity Pic (Equation 11) 


Pre = 


r 
Pic, Pc 


where P,,, is the 15x15 covariance matrix of the evolving 
IMU state, PCCklk is the 6Nx6N covariance matrix of the 
camera pose estimates, and P¢¢,, is the correlation between 
the errors in the IMU state and the camera pose estimates. 
With this notation, the covariance matrix of the propagated 
state is given by: 


Pit siy DR +7. KP icy 
Prsik = 


Phe Mh +74)" Prey 


US 9,766,074 B2 


9 


where P;,,,, 1s computed by numerical integration of the 
Lyapunov equation: 


Py=FP tPF '+GOpuG (Equation 12) 


Numerical integration is carried out for the time interval 
(t,, t,+T), with initial condition P,,,,. The state transition 
matrix @(t,+T, t,) is similarly computed by numerical inte- 
gration of the differential equation 


O(t,+0,t,) =F O(h47,4,), te[0,T] (Equation 13) 


with initial condition P(t,, t,)=I,5. 
C. State Augmentation 

Upon recording a new image, the camera pose estimate is 
computed from the IMU pose estimate as: 


5 4 CT ye (Equation 14) 
q 


isin" 
QI 


is the quaternion expressing the rotation between the IMU 
and camera frames, and ’p,. is the position of the origin of the 
camera frame with respect to {I}, both of which are known. 
This camera pose estimate is appended to the state vector, 
and the covariance matrix of the EKF is augmented accord- 


ingly: 


Ten+is T6n+15 r (Equation 15) 


a al 


where the Jacobian J is derived from Equation 14 as: 


ru | 


CGA — 03x9 03.3 O3x6Nn 


(Equation 16) 
7 [cf pcx | Oso | 


O3x6N 


D. Measurement Model 

Consider next the measurement model employed for 
updating the state estimates. Since the EKF is used for state 
estimation, for constructing a measurement model it suffices 
to define a residual, r, that depends linearly on the state 
errors, X, according to the general form: 


r=HX+noise (Equation 17) 


In this expression H is the measurement Jacobian matrix, 
and the noise term must be zero-mean, white, and uncorre- 
lated to the state error, for the EXF framework to be applied. 

Viewing a static feature from multiple camera poses 
results in constraints involving all these poses. Here, the 
camera observations are grouped per tracked feature, rather 
than per camera pose where the measurements were 
recorded. All the measurements of the same 3D point are 
used to define a constraint equation (cf. Equation 24), 
relating all the camera poses at which the measurements 
occurred. This is achieved without including the feature 
position in the filter state vector. 
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Consider the case of a single feature, f,, 


observed from a set of M, camera poses 


that has been 


model: 


(Equation 18) 


where n,” is the 2x1 image noise vector, with covariance 
matrix R,”=o,,,"I,. The feature position expressed in the 
camera frame, “Des is given by: 

fi 


CG X; (Equation 19) 


G Psy = Gy; |= CE a (Spy; -—pe;) 


az; 


where @ Py is the 3D feature position in the global frame. 
Since this is unknown, in the first step of the algorithm, 
employ a least-squares minimization to obtain an estimate, 
Bys of the feature position. This is achieved using the 
measurements z,”, ieS,, and the filter estimates of the 
camera poses at the corresponding time instants. 
Following the estimation of the feature position, compute 
the measurement residual: 


rv) = wy = (Equation 20) 
where 
Gx, 
peel 
i GZ; CG P; 
CX. 
J 
» Gales 
oP, |=c¢ NCOP ,, -°Be) 
or, 


Linearizing about the estimates for the camera pose and for 
the feature position, the residual of Equation 20 can be 
approximated as: 


1 PnH, OX+H, PD, 


Bythi » (Equation 21) 


In the preceding expression H, ” and Hy ® are the Jacobians 
of the measurement z,” with’ respect 40 the state and the 
feature position, respectively, and © By is the error in the 
position estimate of f,. The exact values of the Jacobians in 
this expression are generally available. Stacking the residu- 
als of - M, measurements of this feature yields: 

wT OHA, Py +n? (Equation 22) 


where . H.”, H 9, and n” are olnce vectors or matrices 
with elements 1, H, oO, Hy , and n,”, for ieS,. Since the 
feature observations i in different i images are independent, the 
covariance matrix of n® is RY=o, m lang: 

Note that since the state estimate, X, is used to compute 
the feature position estimate, the error By in Equation 22 is 
correlated with the errors X. Thus, the residual r is not in 
the form of Equation 17, and cannot be directly applied for 
measurement updates in the EKF. To overcome this, define 
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a residual r,”, by projecting r” on the left nullspace of the 
matrix FEO Specifically, let A denote the unitary matrix 
whose columns form the basis of the left nullspace of H, to 
obtain: 


pr P=AT(gP-2 we TH OX ATH (Equation 23) 


=H PRO 40 (Equation 24) 


Since the 2M,x3 matrix 13 has full column rank, its left 
nullspace is of dimension 2M,-3. Therefore, 1,” is a (2M,- 
3)x1 vector. This residual is independent of the errors in the 
feature coordinates, and thus EKF updates can be performed 
based on it. Equation 24 defines a linearized constraint 
between all the camera poses from which the feature f; was 
observed. This expresses all the available information that 
the measurements z,” provide for the M, states, and thus the 
resulting EKF update is optimal, except for the inaccuracies 
caused by linearization. 

In order to compute the residual r,” and the measurement 
matrix H,”, the unitary matrix A does not need to be 
explicitly evaluated. Instead, the projection of the vector r 
and the matrix H,” on the nullspace of HY? can be computed 
very efficiently using Givens O(M,’) operations. Addition- 
ally, since the matrix A is unitary, the covariance matrix of 
the noise vector n,” is given by: 


E fn Pn? }=Oy7ATA=Oyn?7 Dongs 


The residual defined in Equation 23 is not the only 
possible expression of the geometric constraints that are 
induced by observing a static feature in M, images. An 
alternative approach is, for example, to employ the epipolar 
constraints that are defined for each of the M, (M,-1)/2 pairs 
of images. However, the resulting M, (M,-1)/2 equations 
would still correspond to only 2M,-3 independent con- 
straints, since each measurement is used multiple times, 
rendering the equations statistically correlated. Experimen- 
tal data shows that employing linearization of the epipolar 
constraints results in a significantly more complex imple- 
mentation, and yields inferior results compared to the 
approach described above. 

E. EKF Updates 

The preceding section presents a measurement model that 
expresses the geometric constraints imposed by observing a 
static feature from multiple camera poses. Next, consider the 
update phase of the EKF, in which the constraints from 
observing multiple features are used. EKF updates are 
triggered by one of the following two events: 

When a feature that has been tracked in a number of 
images is no longer detected, then all the measurements 
of this feature are processed using the method pre- 
sented above in the section concerning Measurement 
Model. This case occurs most often, as features move 
outside the camera’s field of view. 

Every time a new image is recorded, a copy of the current 
camera pose estimate is included in the state vector (see 
the section concerning State Augmentation). If the 
maximum allowable number of camera poses, N,,,..5 
has been reached, at least one of the old ones must be 
removed. Prior to discarding states, all the feature 
observations that occurred at the corresponding time 
instants are used, in order to utilize their localization 
information. In one example, choose N,,,,,/3 poses that 
are evenly spaced in time, starting from the second- 
oldest pose. These are discarded after carrying out an 
EKF update using the constraints of features that are 
common to these poses. One example always retains 
the oldest pose in the state vector, because the geomet- 
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ric constraints that involve poses further back in time 
typically correspond to larger baseline, and hence carry 
more valuable positioning information. 

Consider next the update process. At a given time step the 
constraints of L features, selected by the above two criteria, 
must be processed. Following the procedure described in the 
preceding section, compute a residual vector r,, j=1... L, 
as well as a corresponding measurement matrix H,”, j= 
1 ...L for each of these features (cf. Equation 23). Stacking 
all residuals in a single vector yields: 


r,=HX4n, (Equation 25) 


where r, and n, are vectors with block elements r,” and n,, 
j=l... L, respectively, and H, is a matrix with block rows 
H,™, j=l... L. 

Since the feature measurements are statistically indepen- 
dent, the noise vectors n,” are uncorrelated. Therefore, the 
covariance matrix of the noise vector n, is equal to 
R,=0,,, lz, Where d=X,_,“(2M,-3) is the dimension of the 
residual r,. In practice, d can be a quite large number. For 
example, if 10 features are seen in 10 camera poses each, the 
dimension of the residual is 170. In order to reduce the 
computational complexity of the EKF update, employ the 
QR decomposition of the matrix H,,. Specifically, denote this 
decomposition as 


T; 
H, = (0) 2a)| 4 


where Q, and Q, are unitary matrices whose columns form 
bases for the range and nullspace of H,, respectively, and T,, 
is an upper triangular matrix. With this definition, Equation 
25 yields: 


Ty |. (Equation 26) 
ro =([Q1 2a] 0 X+N.> 
ofr, ‘0 TH |. Qin, (Equation 27) 
= + 
| Oro 0 Qin, | 


From the last equation it becomes clear that by projecting 
the residual r, on the basis vectors of the range of H,, all the 
useful information in the measurements is retained. The 
residual Q,’r, is only noise, and can be completely dis- 
carded. For this reason, instead of the residual shown in 
Equation 25, employ the following residual for the EKF 


update: 


1,=O)'r,=TyX+n,, (Equation 28) 


In this expression n,=Q,7n, is a noise vector whose cova- 
riance matrix is equal to R,=Q,’R,Q,=6,,,"I,, with r being 
the number of columns in Q,. The EKF update proceeds by 


computing the Kalman gain: 
K=PT 7 (TyPT 77! +R,) + (Equation 29) 


while the correction to the state is given by the vector 


AX=Kr,, (Equation 30) 


In addition, the state covariance matrix is updated according 
to: 
Preetieet=Ue-KT rp)Pe ts Fe-KT pp) +KR,,K™ (Equation 31) 


where E=6N+15 is the dimension of the covariance matrix. 
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Consider the computational complexity of the operations 
needed during the EKF update. The residual r,,, as well as the 
matrix T,,, can be computed using Givens rotations in O(r7d) 
operations, without the need to explicitly form Q,. On the 
other hand, Equation 31 involves multiplication of square 
matrices of dimension €, an O(E*) operation. Therefore, the 
cost of the EKF update is max (O(r7d),O(€*)). If, on the 
other hand, the residual vector r, was employed, without 
projecting it on the range of H,, the computational cost of 
computing the Kalman gain would have been O(d?). Since 
typically d>>§, r, the use of the residual r, results in 
substantial savings in computation. 

Discussion 

Consider next some of the properties of the described 
algorithm. As shown elsewhere in this document, the filter’s 
computational complexity is linear in the number of 
observed features, and at most cubic in the number of states 
that are included in the state vector. Thus, the number of 
poses that are included in the state is the most significant 
factor in determining the computational cost of the algo- 
rithm. Since this number is a selectable parameter, it can be 
tuned according to the available computing resources, and 
the accuracy requirements of a given application. In one 
example, the length of the filter state is adaptively controlled 
during filter operation, to adjust to the varying availability of 
resources. 

One source of difficulty in recursive state estimation with 
camera observations is the nonlinear nature of the measure- 
ment model. Vision-based motion estimation is very sensi- 
tive to noise, and, especially when the observed features are 
at large distances, false local minima can cause convergence 
to inconsistent solutions. The problems introduced by non- 
linearity can be addressed using techniques such as Sigma- 
point Kalman filtering, particle filtering, and the inverse 
depth representation for features. The algorithm is robust to 
linearization inaccuracies for various reasons, including (i) 
the inverse feature depth parametrization used in the mea- 
surement model and (ii) the delayed linearization of mea- 
surements. According to the present subject matter, multiple 
observations of each feature are collected prior to using 
them for EKF updates, resulting in more accurate evaluation 
of the measurement Jacobians. 

In typical image sequences, most features can only be 
reliably tracked over a small number of frames (“opportu- 
nistic” features), and only a few can be tracked for long 
periods of time, or when revisiting places (persistent fea- 
tures). This is due to the limited field of view of cameras, as 
well as occlusions, image noise, and viewpoint changes, that 
result in failures of the feature tracking algorithms. If all the 
poses in which a feature has been seen are included in the 
state vector, then the proposed measurement model is opti- 
mal, except for linearization inaccuracies. Therefore, for 
realistic image sequences, the present algorithm is able to 
use the localization information of the opportunistic fea- 
tures. Also note that the state vector X, is not required to 
contain only the IMU and camera poses. In one example, the 
persistent features can be included in the filter state, and 
used for SLAM. This can improve the attainable localization 
accuracy within areas with lengthy loops. 


Experimental Example 
The algorithm described herein has been tested both in 


simulation and with real data. Simulation experiments have 
verified that the algorithm produces pose and velocity esti- 
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mates that are consistent, and can operate reliably over long 
trajectories, with varying motion profiles and density of 
visual features. 

The following includes the results of the algorithm in an 
outdoor experiment. The experimental setup included a 
camera/IMU system, placed on a car that was moving on the 
streets of a typical residential area in Minneapolis, Minn. 
The system included a Pointgrey FireFly camera, registering 
images of resolution 640x480 pixels at 3 Hz, and an Inertial 
Science ISIS IMU, providing inertial measurements at a rate 
of 100 Hz. During the experiment all data were stored on a 
computer and processing was done off-line. Some example 
images from the recorded sequence are shown in FIG. 3. 

The recorded sequence included a video of 1598 images 
representing about 9 minutes of driving. 

For the results shown here, feature extraction and match- 
ing was performed using the SIFT algorithm. During this 
run, a maximum of 30 camera poses was maintained in the 
filter state vector. Since features were rarely tracked for 
more than 30 images, this number was sufficient for utilizing 
most of the available constraints between states, while 
attaining real-time performance. Even though images were 
only recorded at 3 Hz due to limited hard disk space on the 
test system, the estimation algorithm is able to process the 
dataset at 14 Hz, on a single core of an Intel T7200 processor 
(2 GHz clock rate). During the experiment, a total of 142903 
features were successfully tracked and used for EKF 
updates, along a 3.2 km-long trajectory. The quality of the 
position estimates can be evaluated using a map of the area. 

In FIG. 4, the estimated trajectory is plotted on a map of 
the neighborhood where the experiment took place. The 
initial position of the car is denoted by a red square on SE 
19” Avenue, and the scale of the map is shown on the top 
left corner. 

FIG. 5 illustrates the 30 bounds for the errors in the 
position, attitude, and velocity. The plotted values are 
3-times the square roots of the corresponding diagonal 
elements of the state covariance matrix. Note that the EKF 
state is expressed in ECEF frame, but for plotting, all 
quantities have been transformed in the initial IMU frame, 
whose x axis is pointing approximately south, and its y axis 
east. 

The trajectory follows the street layout quite accurately 
and, additionally, the position errors that can be inferred 
from this plot agree with the 30 bounds shown in FIG. 5A. 
The final position estimate, expressed with respect to the 
starting pose, is X nai 1-7-92 13.14 -0.78]’m. From the 
initial and final parking spot of the vehicle it is known that 
the true final position expressed with respect to the initial 
pose is approximately X,,,,;-[0 7 0] ™m. Thus, the final 
position error is approximately 10 m in a trajectory of 3.2 
km, i.e., an error of 0.31% of the traveled distance. This is 
remarkable, given that the algorithm does not utilize loop 
closing, and uses no prior information (for example, non- 
holonomic constraints or a street map) about the car motion. 
Note also that the camera motion is almost parallel to the 
optical axis, a condition which is particularly adverse for 
image-based motion estimation algorithms. In FIG. 5B and 
FIG. 5C, the 30 bounds for the errors in the IMU attitude and 
velocity along the three axes are shown. From these, observe 
that the algorithm obtains accuracy (30) better than 1° for 
attitude, and better than 0.35 m/sec for velocity in this 
particular experiment. 

The results demonstrate that the algorithm is capable of 
operating in a real-world environment, and producing very 
accurate pose estimates in real-time. Note that in the dataset 
presented here, several moving objects appear, such as cars, 
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pedestrians, and trees whose leaves move in the wind. The 
algorithm is able to discard the outliers which arise from 
visual features detected on these objects, using a simple 
Mahalanobis distance test. Robust outlier rejection is facili- 
tated by the fact that multiple observations of each feature 
are available, and thus visual features that do not correspond 
to static objects become easier to detect. Note also that the 
method can be used either as a stand-alone pose estimation 
algorithm, or combined with additional sensing modalities 
to provide increased accuracy. For example, a GPS sensor 
(or other type of sensor) can be used to compensate for 
position drift. 


The present subject matter includes an EKF-based esti- 
mation algorithm for real-time vision-aided inertial naviga- 
tion. One aspect of this work is the derivation of a mea- 
surement model that is able to express the geometric 
constraints that arise when a static feature is observed from 
multiple camera poses. This measurement model does not 
require including the 3D feature positions in the state vector 
of the EKF, and is optimal, up to the errors introduced by 
linearization. The resulting EKF-based pose estimation 
algorithm has computational complexity linear in the num- 
ber of features, and is capable of very accurate pose esti- 
mation in large-scale real environments. One example 
includes fusing inertial measurements with visual measure- 
ments from a monocular camera. However, the approach is 
general and can be adapted to different sensing modalities 
both for the proprioceptive, as well as for the exteroceptive 
measurements (e.g., for fusing wheel odometry and laser 
scanner data). 


Selected Calculations 


Intersection can be used to compute an estimate of the 
position of a tracked feature f,. To avoid local minima, and 
for better numerical stability, during this process, use an 
inverse-depth parametrization of the feature position. In 
particular, if {Cn} is the camera frame in which the feature 
was observed for the first time, then the feature coordinates 
with respect to the camera at the i-th time instant are: 


aq)’ peed Equation 32 
“pg = C(6,9) "PR + PCy FES; (Eq ) 


In this expression 


and “‘p,. are the rotation and translation between the camera 
frames at time instants n and i, respectively. Equation 32 can 
be rewritten as: 


(Equation 33) 


Qj (Equation 34) 
Bi] +2)" Bo, 
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-continued 
hi (aj, Bj, Pj) 
= %Z,] hi2(@;, Bj, Pj) 
hj3(@;, Bj, Pj) 


(Equation 35) 


In the last expression h,,, h,. and h,, are scalar functions of 
the quantities o.,,f,,0,, which are defined as: 


(Equation 36) 


Substituting from Equation 35 into Equation 18, express the 
measurement equations as functions of a,, B, and p, only: 


P (Equation 37) 
+ni? 


w@ 1 | 
gy = 
hi3(@j, Bj, Pj)| hi2(@;, Bj, P;) 


Given the measurements z,”, ieS,, and the estimates for the 
camera poses in the state vector, obtain estimates for Oy, 6, 
and p,, using Gauss-Newton least squares minimization. 
Then, the global feature position is computed by: 


aj (Equation 38) 


Note that during the least-squares minimization process the 
camera pose estimates are treated as known constants, and 
their covariance matrix is ignored. As a result, the minimi- 
zation can be carried out very efficiently, at the expense of 
the optimality of the feature position estimates. Recall, 
however, that up to a first-order approximation, the errors in 
these estimates do not affect the measurement residual (cf. 
Equation 23). Thus, no significant degradation of perfor- 
mance is inflicted. 


Additional Notes 


The above detailed description includes references to the 
accompanying drawings, which form a part of the detailed 
description. The drawings show, by way of illustration, 
specific embodiments in which the invention can be prac- 
ticed. These embodiments are also referred to herein as 
“examples.” Such examples can include elements in addi- 
tion to those shown and described. However, the present 
inventors also contemplate examples in which only those 
elements shown and described are provided. 

All publications, patents, and patent documents referred 
to in this document are incorporated by reference herein in 
their entirety, as though individually incorporated by refer- 
ence. In the event of inconsistent usages between this 
document and those documents so incorporated by refer- 
ence, the usage in the incorporated reference(s) should be 
considered supplementary to that of this document; for 
irreconcilable inconsistencies, the usage in this document 
controls. 

In this document, the terms “a” or “an” are used, as is 
common in patent documents, to include one or more than 
one, independent of any other instances or usages of “at least 
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one” or “one or more.” In this document, the term “or” is 
used to refer to a nonexclusive or, such that “A or B” 
includes “A but not B.” “B but not A,” and “A and B,” unless 
otherwise indicated. In the appended claims, the terms 
“including” and “in which” are used as the plain-English 
equivalents of the respective terms “comprising” and 
“wherein.” Also, in the following claims, the terms “includ- 
ing” and “comprising” are open-ended, that is, a system, 
device, article, or process that includes elements in addition 
to those listed after such a term in a claim are still deemed 
to fall within the scope of that claim. Moreover, in the 
following claims, the terms “first,” “second,” and “third,” 
etc. are used merely as labels, and are not intended to impose 
numerical requirements on their objects. 

Method examples described herein can be machine or 
computer-implemented at least in part. Some examples can 
include a computer-readable medium or machine-readable 
medium encoded with instructions operable to configure an 
electronic device to perform methods as described in the 
above examples. An implementation of such methods can 
include code, such as microcode, assembly language code, 
a higher-level language code, or the like. Such code can 
include computer readable instructions for performing vari- 
ous methods. The code may form portions of computer 
program products. Further, the code may be tangibly stored 
on one or more volatile or non-volatile computer-readable 
media during execution or at other times. These computer- 
readable media may include, but are not limited to, hard 
disks, removable magnetic disks, removable optical disks 
(for example, compact disks and digital video disks), mag- 
netic cassettes, memory cards or sticks, random access 
memories (RAMs), read only memories (ROMs), and the 
like. 

The above description is intended to be illustrative, and 
not restrictive. For example, the above-described examples 
(or one or more aspects thereof) may be used in combination 
with each other. Other embodiments can be used, such as by 
one of ordinary skill in the art upon reviewing the above 
description. The Abstract is provided to comply with 37 
C.E.R. §1.72(b), to allow the reader to quickly ascertain the 
nature of the technical disclosure. It is submitted with the 
understanding that it will not be used to interpret or limit the 
scope or meaning of the claims. Also, in the above Detailed 
Description, various features may be grouped together to 
streamline the disclosure. This should not be interpreted as 
intending that an unclaimed disclosed feature is essential to 
any claim. Rather, inventive subject matter may lie in less 
than all features of a particular disclosed embodiment. Thus, 
the following claims are hereby incorporated into the 
Detailed Description, with each claim standing on its own as 
a separate embodiment. The scope of the invention should 
be determined with reference to the appended claims, along 
with the full scope of equivalents to which such claims are 
entitled. 

What is claimed is: 

1. A vision-aided inertial navigation system comprising: 

at least one image source to produce image data for a 

plurality of poses of a frame of reference along a 
trajectory within an environment over a period of time, 
wherein the image data includes features that were each 
observed within the environment at poses of the frame 
of reference along the trajectory, wherein one or more 
of the features were each observed at multiple ones of 
the poses of the frame of reference along the trajectory; 
a motion sensor configured to provide motion data of the 
frame of reference in the environment for the period of 
time; and 
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a hardware-based processor communicatively coupled to 
the image source and communicatively coupled to the 
motion sensor, the processor configured to compute 
estimates for at least a position and orientation of the 
frame of reference for each of the plurality of poses of 
the frame of reference along the trajectory, 
wherein the processor is configured to: 
determine, from the image data, feature measurements 
corresponding to the features observed from the 
poses along the trajectory; 

group the feature measurements according to the fea- 
tures observed within the image data; 

for one or more of the features observed from multiple 
poses along the trajectory, compute based on the 
respective group of feature measurements for the 
feature, one or more constraints that geometrically 
relate the multiple poses from which the respective 
feature was observed; and 

determine the position and orientation of the frame of 
reference for each of the plurality of poses along the 
trajectory by updating, in accordance with the 
motion data and the one or more computed con- 
straints, state information within a state vector rep- 
resenting estimates for the position and orientation of 
the frame of reference along the trajectory while 
excluding, from the state vector, state information 
representing estimates for positions within the envi- 
ronment for the features that were each observed 
from the multiple poses and for which the one or 
more constraints were computed. 

2. The vision-aided inertial navigation system of claim 1 
wherein the processor is configured to generate each of the 
one or more constraints by manipulating a residual of a 
measurement for the respective feature. 

3. The vision-aided inertial navigation system of claim 1, 

wherein the image source includes a camera, and 

wherein the vision-aided inertial navigation system com- 
prises one of a robot or a vehicle. 

4. The vision-aided inertial navigation system of claim 1 
wherein the motion sensor includes an inertial measurement 
unit (IMU). 

5. The vision-aided inertial navigation system of claim 1 
wherein the processor is configured to implement an 
extended Kalman filter to compute the one or more con- 
straints. 

6. The vision-aided inertial navigation system of claim 1 
further including an output device coupled to the processor, 
the output device including at least one of a memory, a 
transmitter, a display, a printer, an actuator, and a controller. 

7. A method comprising: 

receiving, with a processor and from at least one image 
source communicatively coupled to the processor, 
image data for a plurality of poses of a frame of 
reference along a trajectory within an environment over 
a period of time, wherein the image data includes 
features that were each observed within the environ- 
ment at poses of the frame of reference along the 
trajectory, wherein one or more of the features were 
each observed at multiple ones of the poses of the frame 
of reference along the trajectory; 

receiving, with the processor and from a motion sensor 
communicatively coupled to the processor, motion data 
of the frame of reference in the environment for the 
period of time; 
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computing, with the processor, state estimates for at least 
a position and orientation of the frame of reference for 
each of the plurality of poses of the frame of reference 
along the trajectory, 

wherein computing the state estimates comprises: 

determining, from the image data, feature measure- 
ments corresponding to the features observed from 
the poses along the trajectory; 

grouping the feature measurements according to the 
features observed within the image data; 

for one or more of the features observed from multiple 
poses along the trajectory, computing, based on the 
respective group of feature measurements for the 
feature, one or more constraints that geometrically 
relate the multiple poses from which the respective 
feature was observed; and 

determining the position and orientation of the frame of 
reference for each of the plurality of poses along the 
trajectory by updating, in accordance with the 
motion data and the one or more computed con- 
straints, state information within a state vector rep- 
resenting estimates for the position and orientation of 
the frame of reference along the trajectory while 
excluding, from the state vector, state information 
representing estimates for positions within the envi- 
ronment for the features that were each observed 
from the multiple poses and for which the one or 
more constraints were computed; and 

controlling, responsive to the computed state estimates, 

navigation of the frame of reference. 

8. The method of claim 7 wherein computing one or more 
constraints comprises manipulating a residual of a measure- 
ment of the respective feature to reduce an effect of a feature 
estimate error. 

9. The method of claim 7 wherein computing state esti- 
mates further includes computing states estimates for at least 
a velocity, or acceleration of the frame of reference. 

10. The method of claim 7 wherein receiving image data 
includes receiving data from at least one of a camera-based 
sensor, a laser-based sensor, a sonar-based sensor, and a 
radar-based sensor. 

11. The method of claim 7 wherein receiving motion data 
from the motion sensor includes receiving data from at least 
one of a wheel encoder, a velocity sensor, a Doppler radar 
based sensor, a gyroscope, an accelerometer, an airspeed 
sensor, and a global positioning system (GPS) sensor. 

12. The method of claim 7 wherein computing, based on 
the image data, one or more constraints comprises executing 
an Extended Kalman Filter (EKF). 

13. A non-transitory computer-readable storage medium 
comprising instructions that configure a processor to: 

receive, with the processor and from at least one image 

source communicatively coupled to the processor, 
image data for a plurality of poses of a frame of 
reference along a trajectory within an environment over 
a period of time, wherein the image data includes 
features that were each observed within the environ- 
ment at poses of the frame of reference along the 
trajectory, wherein one or more of the features were 
each observed at multiple ones of the poses of the frame 
of reference along the trajectory; 
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receive, with the processor and from a motion sensor 
communicatively coupled to the processor, motion data 
of the frame of reference in the environment for the 
period of time; 

determine, from the image data, feature measurements 

corresponding to the features observed from the poses 
along the trajectory; 

group the feature measurements according to the features 

observed within the image data; 

for one or more of the features observed from multiple 

poses along the trajectory, compute, based on the 
respective group of feature measurements for the fea- 
ture, one or more constraints that geometrically relate 
the multiple poses from which the respective feature 
was observed; 

determine state estimates for at least a position and an 

orientation of the frame of reference for each of the 
plurality of poses along the trajectory by updating, in 
accordance with the motion data and the one or more 
computed constraints, state information within a state 
vector representing estimates for the position and ori- 
entation of the frame of reference along the trajectory 
while excluding, from the state vector, state estimates 
for positions within the environment for the features 
that were each observed from the multiple poses and 
for which the one or more constraints were computed; 
and 

output, for display, information responsive to the com- 

puted state estimates for the frame of reference. 

14. The computer-readable storage medium of claim 13 
wherein the instructions configure the processor to compute 
the one or more constraints by manipulating a residual of a 
measurement of the respective feature to reduce an effect of 
a feature estimate error. 

15. The computer-readable storage medium of claim 13 
wherein the instructions configure the processor to compute 
states estimates for at least a velocity, or acceleration of the 
frame of reference. 

16. The computer-readable storage medium of claim 13 
wherein the image data includes data from at least one of a 
camera-based sensor, a laser-based sensor, a sonar-based 
sensor, and a radar-based sensor. 

17. The computer-readable storage medium of claim 13 
wherein the motion data includes data from at least one of 
a wheel encoder, a velocity sensor, a Doppler radar based 
sensor, a gyroscope, an accelerometer, an airspeed sensor, 
and a global positioning system (GPS) sensor. 

18. The computer-readable storage medium of claim 13 
wherein the instructions configure the processor to execute 
an Extended Kalman Filter (EKF). 

19. The computer-readable storage medium of claim 13 
wherein one or more of the constraints are computed based 
on the feature measurements for features observed at three 
or more of the poses along the trajectory. 

20. The vision-aided inertial navigation system of claim 1 
wherein the processor computes one or more of the con- 
straints based on the feature measurements for features 
observed at three or more of the poses along the trajectory. 

21. The method of claim 7 wherein one or more of the 
constraints are computed based on the feature measurements 
for features observed at three or more of the poses along the 
trajectory. 


